Chen Maotang, Zheng Sheng’an, You Litong, Wang Jingyu, Yan Tian, Tu Yaofeng, Han Yinjun, Huang Linpeng. A Distributed Persistent Memory File System Based on RDMA Multicast[J]. Journal of Computer Research and Development, 2021, 58(2): 384-396. DOI: 10.7544/issn1000-1239.2021.20200369
Citation:
Chen Maotang, Zheng Sheng’an, You Litong, Wang Jingyu, Yan Tian, Tu Yaofeng, Han Yinjun, Huang Linpeng. A Distributed Persistent Memory File System Based on RDMA Multicast[J]. Journal of Computer Research and Development, 2021, 58(2): 384-396. DOI: 10.7544/issn1000-1239.2021.20200369
Chen Maotang, Zheng Sheng’an, You Litong, Wang Jingyu, Yan Tian, Tu Yaofeng, Han Yinjun, Huang Linpeng. A Distributed Persistent Memory File System Based on RDMA Multicast[J]. Journal of Computer Research and Development, 2021, 58(2): 384-396. DOI: 10.7544/issn1000-1239.2021.20200369
Citation:
Chen Maotang, Zheng Sheng’an, You Litong, Wang Jingyu, Yan Tian, Tu Yaofeng, Han Yinjun, Huang Linpeng. A Distributed Persistent Memory File System Based on RDMA Multicast[J]. Journal of Computer Research and Development, 2021, 58(2): 384-396. DOI: 10.7544/issn1000-1239.2021.20200369
1(Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240)
2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
3(ZTE Corporation, Nanjing 210012)
Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003302) and the SJTU-Huawei Innovation Research Lab Project (FA2018091021-202004).
The development of persistent memory and remote direct memory access(RDMA) provides new opportunities for designing efficient distributed systems. However, the existing RDMA-based distributed systems are far from fully exploiting RDMA multicast capabilities, which makes them difficult to solve the problem of multi-copy file data transmission in one-to-many transmission, degrading system performance. In this paper, a distributed persistent memory and RDMA multicast transmission based file system(MTFS) is proposed. It efficiently transmits data to different data nodes by the low-latency multicast transmission mechanism, which makes full use of the RDMA multicast capability, hence avoiding high latency due to multi-copy file data transmission operations. To improve the flexibility of transmission operations, a multi-mode multicast remote procedure call(RPC) mechanism is proposed, which enables the adaptive recognition of RPC requests, and moves transmission operations out of the critical path to further improve transmission efficiency. MTFS also provides a lightweight consistency guarantee mechanism. By designing a crash recovery mechanism, a data verification module and a retransmission scheme, MTFS is able to quickly recover from a crash, and achieves file system reliability and data consistency by error detection and data correction. Experimental results show that MTFS has greatly increased the throughput by 10.2-219 times compared with GlusterFS. MTFS outperforms NOVA by 10.7% on the Redis workload, and achieves good scalability in multi-thread workloads.